|
Histograms are most commonly used as visual representations of data. However, database systems use histograms to summarize data internally and provide size estimates for queries. These histograms are not presented to users or displayed visually, so a wider range of options are available for their construction. Simple or exotic histograms are defined by four parameters, Sort Value, Source Value, Partition Class and Partition Rule. The most basic histogram is the equi-width histogram, where each bucket represents the same range of values. That histogram would be defined as having a Sort Value of Value, a Source Value of Frequency, be in the Serial Partition Class and have a Partition Rule stating that all buckets have the same range. V-optimal histograms are an example of a more "exotic" histogram. V-optimality is a Partition Rule which states that the bucket boundaries are to be placed as to minimize the cumulative weighted variance of the buckets. Implementation of this rule is a complex problem and construction of these histograms is also a complex process. ==Definition== A v-optimal histogram is based on the concept of minimizing a quantity which is called the ''weighted variance'' in this context.〔Poosala at al. (1996)〕 This is defined as : where the histogram consists of ''J'' bins or buckets, ''nj'' is the number of items contained in the ''j''th bin and where ''Vj'' is the variance between the values associated with the items in the ''j''th bin. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「V-optimal histograms」の詳細全文を読む スポンサード リンク
|